Deepview: Virtual Disk Failure Diagnosis and Pattern Detection for Azure

نویسندگان

  • Qiao Zhang
  • Guo Yu
  • Chuanxiong Guo
  • Yingnong Dang
  • Nick Swanson
  • Xinsheng Yang
  • Randolph Yao
  • Murali Chintalapati
  • Arvind Krishnamurthy
  • Thomas Anderson
چکیده

In Infrastructure as a Service (IaaS), virtual machines (VMs) use virtual hard disks (VHDs) provided by a remote storage service via the network. Due to separation of VMs and their VHDs, a new type of failure, called VHD failure, which may be caused by various components in the IaaS stack, becomes the dominating factor that reduces VM availability. The current state-of-the-art approaches fall short in localizing VHD failures because they only look at individual components. In this paper, we designed and implemented a system called Deepview for VHD failure localization. Deepview composes a global picture of the system by connecting all the components together, using individual VHD failure events. It then uses a novel algorithm which integrates Lasso regression and hypothesis testing for accurate and timely failure localization. We have deployed Deepview at Microsoft Azure, one of the largest IaaS providers. Deepview reduced the number of unclassified VHD failure events from tens of thousands to several hundreds. It unveiled new patterns including unplanned top-of-rack switch (ToR) reboots and storage gray failures. Deepview reduced the timeto-detection for incidents to under 10 minutes. Deepview further helped us quantify the implications of some key architectural decisions for the first time, including ToR switches as a single-point-of-failure and the computestorage separation.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Privacy-Aware Malware Detection

Overview In cloud-infrastructure, cloud service providers (e.g. Amazon Web Service, IBM Cloud, Microsoft Azure) are not allowed to access the content of customer virtual machines. In consequence, it is hard for the providers to protect their infrastructure from malware infections. As the virtual machines are usually created from a limited set of standard images and likely share many vulnerabili...

متن کامل

New Use Cases for Snort: Cloud and Mobile Environments

First, this case study explores an Intrusion Detection System package called Snort (provided by Cisco Systems) in a cloud environment. Snort is an open source and highly scalable signaturebased intrusion detection system. Here, Snort is deployed on Ubuntu Server 16.0.4 running on a virtual machine within a Microsoft Azure cloud system. This paper provides details on installing Snort on the virt...

متن کامل

Antibiotic Resistance Pattern and Genotype of Beta-Lactamase Producing Escherichia coli Isolates from Urinary Tract Infections in Zabol-Souteast of Iran

Introduction: Extended spectrum beta-lactamase (ESBL) producing Escherichia coli generate a major problem for clinical therapeutics and epidemiological study. The incidence of ESBL producing strains among clinical isolates has been steadily increasing during the past few years, and remains an important cause of failure of therapy with cephalosporins. The aim of this study was to determine the a...

متن کامل

Evaluation of Five Phenotypic Methods for Detection of Methicillin Resistant Staphylococcus aureus (MRSA)

  BackgroundandObjectives:RapidandaccuratedetectionofmethicillinresistantStaphylococcus aureus (MRSA) is an important role of clinical microbiology laboratories to avoid treatment failure. The aim of this study was to compare conventional methods against the E-test minimum inhibitory concentration (MIC) method to determine the best phenotypic method. Materials a...

متن کامل

A Novel Intelligent Fault Diagnosis Approach for Critical Rotating Machinery in the Time-frequency Domain

The rotating machinery is a common class of machinery in the industry. The root cause of faults in the rotating machinery is often faulty rolling element bearings. This paper presents a novel technique using artificial neural network learning for automated diagnosis of localized faults in rolling element bearings. The inputs of this technique are a number of features (harmmean and median), whic...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2018